Semi-Supervised Projected Clustering
نویسندگان
چکیده
Recent studies suggest that projected clusters with extremely low dimensionality exist in many real datasets. A number of projected clustering algorithms have been proposed in the past several years, but few can identify clusters with dimensionality lower than 10% of the total number of dimensions, which are commonly found in some real datasets such as gene expression profiles. In this paper we propose a new algorithm that can accurately identify projected clusters with relevant dimensions as few as 5% of the total number of dimensions. It makes use of a robust objective function that combines object clustering and dimension selection into a single optimization problem. The algorithm can also utilize domain knowledge in the form of labeled objects and labeled dimensions to improve its clustering accuracy. We believe this is the first semi-supervised projected clustering algorithm. Both theoretical analysis and experimental results show that by using a small amount of input knowledge, possibly covering only a portion of the underlying classes, the new algorithm can be further improved to accurately detect clusters with only 1% of the dimensions being relevant. The algorithm is also useful in getting a target set of clusters when there are multiple possible groupings of the objects.
منابع مشابه
Vinayaka : a Semi-supervised Projectedclusteringmethodusing Differential Evolution
Differential Evolution (DE) is an algorithm for evolutionary optimization. Clustering problems have been solved by using DE based clustering methods but these methods may fail to find clusters hidden in subspaces of high dimensional datasets. Subspace and projected clustering methods have been proposed in literature to find subspace clusters that are present in subspaces of dataset. In this pap...
متن کاملA semi-supervised approach to projected clustering with applications to microarray data
Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help de...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملNovel Trends in Clustering
Clustering or finding a natural grouping of a data set is essential for knowledge discovery in many applications. This chapter provides an overview on emerging trends within the vital research area of clustering including subspace and projected clustering, correlation clustering, semi-supervised clustering, spectral clustering and parameter-free clustering. To raise the awareness of the reader ...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کامل